feat(otel): add token breakdown attributes to conclusion spans by Copilot · Pull Request #26121 · github/gh-aw

Copilot · 2026-04-14T00:44:33Z

sendJobConclusionSpan emitted only gh-aw.effective_tokens — a cost-weighted aggregate — while the full per-type breakdown already existed in /tmp/gh-aw/agent_usage.json (written by parse_token_usage.cjs) but was never read.

Changes

send_otlp_span.cjs: After the gh-aw.effective_tokens block, reads agent_usage.json via the existing readJSONIfExists helper and conditionally pushes four new span attributes (only when value > 0):
- gh-aw.tokens.input
- gh-aw.tokens.output
- gh-aw.tokens.cache_read
- gh-aw.tokens.cache_write
send_otlp_span.test.cjs: New describe("token breakdown enrichment in conclusion span") block covering: all-fields present, file absent, zero-value omission, and invalid JSON — matching the pattern used by the rate-limit enrichment tests.

const agentUsage = readJSONIfExists("/tmp/gh-aw/agent_usage.json") || {};
if (typeof agentUsage.input_tokens === "number" && agentUsage.input_tokens > 0) {
  attributes.push(buildAttr("gh-aw.tokens.input", agentUsage.input_tokens));
}
// … output_tokens, cache_read_tokens, cache_write_tokens

These attributes enable cache-hit-rate panels (cache_read / (cache_read + cache_write)), per-type cost attribution, and fine-grained threshold alerts without requiring step summary HTML.

Read agent_usage.json in sendJobConclusionSpan and emit gh-aw.tokens.input, gh-aw.tokens.output, gh-aw.tokens.cache_read, and gh-aw.tokens.cache_write span attributes. Closes #<issue> Agent-Logs-Url: https://github.com/github/gh-aw/sessions/5f05f55a-111f-459e-9aa5-34bca00d4a14 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot

Pull request overview

Adds per-token-type breakdown attributes to the job conclusion OTLP span by reading the existing /tmp/gh-aw/agent_usage.json, enabling more detailed observability (cache hit rate, per-type cost attribution, and alerting) than the current gh-aw.effective_tokens aggregate alone.

Changes:

Enriches sendJobConclusionSpan with four new conditional span attributes: input/output/cache read/cache write tokens.
Updates sendJobConclusionSpan JSDoc to document the additional runtime file.
Adds a focused test suite covering presence/absence, zero-value omission, and invalid JSON handling for agent_usage.json.

Show a summary per file

File	Description
actions/setup/js/send_otlp_span.cjs	Reads `agent_usage.json` and conditionally emits `gh-aw.tokens.*` attributes on conclusion spans.
actions/setup/js/send_otlp_span.test.cjs	Adds tests validating token breakdown enrichment behavior in conclusion spans.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 2/2 changed files
Comments generated: 0

github-actions · 2026-04-14T00:58:41Z

🧪 Test Quality Sentinel Report

Test Quality Score: 83/100

✅ Excellent test quality

Metric	Value
New/modified tests analyzed	4
✅ Design tests (behavioral contracts)	4 (100%)
⚠️ Implementation tests (low value)	0 (0%)
Tests with error/edge cases	3 (75%)
Duplicate test clusters	0
Test inflation detected	Yes (108 test lines / 23 prod lines ≈ 4.7:1)
🚨 Coding-guideline violations	None

Test Classification Details

Test	File	Classification	Issues Detected
`includes all four token breakdown attributes when agent_usage.json is present`	`actions/setup/js/send_otlp_span.test.cjs`	✅ Design	Happy path only; verifies all 4 OTLP attributes carry correct values
`omits all token breakdown attributes when agent_usage.json is absent`	`actions/setup/js/send_otlp_span.test.cjs`	✅ Design	Edge case: ENOENT; verifies no spurious attributes emitted
`omits a token attribute when its value is zero`	`actions/setup/js/send_otlp_span.test.cjs`	✅ Design	Edge case: zero-value filtering boundary
`omits token breakdown attributes when agent_usage.json contains invalid JSON`	`actions/setup/js/send_otlp_span.test.cjs`	✅ Design	Error case: malformed input; verifies graceful degradation

Flagged Tests — Requires Review

No tests require immediate review. One non-blocking observation is noted below.

i️ Test inflation ratio (informational)

actions/setup/js/send_otlp_span.test.cjs added 108 lines while the production file send_otlp_span.cjs added 23 lines (ratio ≈ 4.7:1, exceeding the 2:1 threshold). This triggers a 10-point score deduction per the rubric, but in context it is expected: each test case requires significant boilerplate (mock setup, fetch stub, env vars, full OTLP response parsing). The coverage is genuine and the overhead is justified by the 4 distinct behavioral scenarios tested.

Language Support

Tests analyzed:

🐹 Go (*_test.go): 0 tests (no Go test files changed)
🟨 JavaScript (*.test.cjs): 4 tests (vitest)

Verdict

✅ Check passed. 0% of new tests are implementation tests (threshold: 30%). All 4 new tests verify observable OTLP span output — the attributes emitted to telemetry backends — under four meaningful scenarios: all tokens present, file absent, zero-value tokens omitted, and invalid JSON tolerated gracefully. The vi.spyOn(fs, "readFileSync") mocking targets external filesystem I/O, which is legitimate per project guidelines.

📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

Assert on observable outputs, return values, or state changes
Cover error paths and boundary conditions
Would catch a behavioral regression if deleted
Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

Assert on internal function calls (mocking internals)
Only test the happy path with typical inputs
Break during legitimate refactoring even when behavior is correct
Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

🧪 Test quality analysis by Test Quality Sentinel · ● 536.7K · ◷

github-actions

✅ Test Quality Sentinel: 83/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). All 4 new tests are behavioral contracts that verify observable OTLP span attributes under distinct scenarios (happy path, absent file, zero-value filtering, invalid JSON). No coding-guideline violations detected.

Initial plan

62e8708

Copilot AI assigned Copilot and pelikhan Apr 14, 2026

Copilot started work on behalf of pelikhan April 14, 2026 00:44 View session

Copilot AI linked an issue Apr 14, 2026 that may be closed by this pull request

[otel-advisor] add token breakdown (input/output/cache) attributes to conclusion spans #26092

Closed

8 tasks

Copilot AI changed the title ~~[WIP] Add token breakdown attributes to conclusion spans~~ feat(otel): add token breakdown attributes to conclusion spans Apr 14, 2026

Copilot AI requested a review from pelikhan April 14, 2026 00:52

Copilot finished work on behalf of pelikhan April 14, 2026 00:52

pelikhan marked this pull request as ready for review April 14, 2026 00:53

Copilot AI review requested due to automatic review settings April 14, 2026 00:53

Copilot started reviewing on behalf of pelikhan April 14, 2026 00:54 View session

Copilot AI reviewed Apr 14, 2026

View reviewed changes

github-actions bot mentioned this pull request Apr 14, 2026

[aw] No-Op Runs #25214

Open

github-actions bot approved these changes Apr 14, 2026

View reviewed changes

pelikhan merged commit c778e52 into main Apr 14, 2026
77 of 78 checks passed

pelikhan deleted the copilot/add-token-breakdown-to-spans branch April 14, 2026 01:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(otel): add token breakdown attributes to conclusion spans#26121

feat(otel): add token breakdown attributes to conclusion spans#26121
pelikhan merged 2 commits intomainfrom
copilot/add-token-breakdown-to-spans

Copilot AI commented Apr 14, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions bot commented Apr 14, 2026

Uh oh!

github-actions bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

github-actions bot commented Apr 14, 2026

🧪 Test Quality Sentinel Report

Test Quality Score: 83/100

Test Classification Details

Flagged Tests — Requires Review

i️ Test inflation ratio (informational)

Language Support

Verdict

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Apr 14, 2026 •

edited

Loading